Modified shine-dalgarno sequences and methods of use thereof

ABSTRACT

Novel Shine-Dalgarno (ribosome binding site) sequences, vectors containing such sequences, and host cells transformed with these vectors are provided. Methods of use of such sequences, vectors, and host cells for the efficient production of proteins and fragments thereof in prokaryotic systems are also provided. In particular embodiments of the invention, compounds and methods for high efficiency production of soluble protein in prokaryotic systems are provided.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of International Application Ser. No.PCT/US03/19786, filed Jun. 25, 2003, which claims benefit under 35U.S.C. 119(e) to U.S. Provisional Application Ser. No. 60/391,433, filedJun. 26, 2002, and 60/406,630, filed Aug. 29, 2002, each of which ishereby incorporated by reference in its entirety.

FIELD OF THE INVENTION

The present invention relates to novel Shine-Dalgarno (ribosome bindingsite) sequences, vectors containing such sequences, and host cellstransformed with these vectors. The present invention also relates tomethods of use of such sequences, vectors, and host cells for theefficient production of proteins and fragments thereof in prokaryoticsystems, and in one aspect of the invention, provides for highefficiency production of soluble protein in prokaryotic systems.

BACKGROUND OF THE INVENTION

The level of production of a protein in a host cell is determined bythree major factors: the number of copies of its structural gene withinthe cell, the efficiency with which the structural gene copies aretranscribed and the efficiency with which the resulting messenger RNA(“mRNA”) is translated. The transcription and translation efficienciesare, in turn, dependent on nucleotide sequences that are normallysituated ahead of the desired structural genes or the translatedsequence. These nucleotide sequences, also known as expression controlsequences, define, inter alia, the locations at which RNA polymerasebinds (the promoter sequence to initiate transcription; see also EMBO J.5:2995–3000 (1986)) and at which ribosomes bind and interact with themRNA (the product of transcription) to initiate translation.

In most prokaryotes, the purine-rich ribosome binding site known as theShine-Dalgarno (S-D) sequence assists with the binding and positioningof the 30S ribosome component relative to the start codon on the mRNAthrough interaction with a pyrimidine-rich region of the 16S ribosomalRNA. See, e.g., Shine & Dalgarno, Proc. Natl. Acad. Sci. USA 71:1342–46(1976). The S-D sequence is located on the mRNA downstream from thestart of transcription and upstream from the start of translation,typically from 4–14 nucleotides upstream of the start codon, and moretypically from 8–10 nucleotides upstream of the start codon. Because ofthe role of the S-D sequence in translation, there is a directrelationship between the efficiency of translation and the efficiency(or strength) of the S-D sequence.

Not all S-D sequences have the same efficiency, however. Accordingly,prior attempts have been made to increase the efficiency of ribosomalbinding, positioning, and translation by, inter alia, changing thedistance between the S-D sequence and the start codon, changing thecomposition of the space between the S-D sequence and the start codon,modifying an existing S-D sequence, using a heterologous S-D sequence,and manipulating of the secondary structure of mRNA during theinitiation of translation. Despite these changes, however, success inincreasing of protein expression efficiency in prokaryotic systems hasremained an elusive and unpredictable goal due to a variety of factors,including, inter alia, the host cells used, the expression controlsequences (including the S-D sequence) used, and the characteristics ofthe gene and protein being expressed. See, e.g., Stenstrom, et al., Gene273(2):259–265 (2001); Komarova, et al., Bioorg. Kbim. 27(4)282–290(2001); Stenstrom, et al., Gene 263(1–2):273–284 (2001); and Mironova,et al., Microbiol. Res. 154(1):35–41 (1999). For example, efficientexpression of soluble B. anthracis protective antigen (PA) has proveddifficult in E. coli. See, e.g., Sharma, et al. Protein Expression andPurification 7:33–38 (1996) (indicating 0.5 mg/L at 70% purity);Chauhan, et al. Biochem. Biophys. Res. Commun.; 283(2):308–15 (2001)(indicating 125 mg/L); Gupta, et al. Protein Expr. Purif. 16(3):369–76(1999) (indicating 2 mg/L).

Accordingly, there remains a demand in the art for compositions andmethods for increasing the efficiency of ribosome binding andtranslation in prokaryotic systems, thereby resulting in increasedefficiency of protein expression. This demand is especially strong forproteins that are difficult to express in existing systems, and forproteins that are desired in large quantity for pharmacological,therapeutic, or industrial use.

SUMMARY OF THE INVENTION

The present invention encompasses novel Shine-Dalgarno sequences thatresult in increased efficiency of protein expression in prokaryoticsystems. The present invention further relates to vectors comprisingsuch S-D sequences and host cells transformed with such vectors. Inparticular embodiments, the present invention relates to methods forproducing proteins and fragments thereof in prokaryotic systems usingsuch S-D sequences, vectors, and host cells. In certain embodiments,methods of use of the S-D sequences, vectors, and host cells of theinvention provide high efficiency production of soluble protein inprokaryotic systems, including prokaryotic in vitro translation systems.

In particular embodiments of the invention, the novel S-D sequencecomprises (or alternately consists of) SEQ ID NO:2. In additionalembodiments, the novel S-D sequence comprises (or alternately consistsof) nucleotides 4–13 of SEQ ID NO:2. The invention also encompasses theS-D sequence of SEQ ID NO:18, described at paragraph 0426 of U.S.Provisional Application No. 60/368,548, filed Apr. 1, 2002, and in U.S.Provisional Application No. 60/331,478, filed Nov. 16, 2001, each ofwhich is hereby incorporated by reference herein in its entirety.

The protein or fragment thereof may be of prokaryotic, eukaryotic, orviral origin, or may be artificial. In particular embodiments, the S-Dsequences, vectors, and host cells of the invention are used to expressB. anthracis protective antigen (PA), mutated protective antigens (mPAs)(See, e.g., Sellman et al, JBC 276(11):8371–8376 (2001)), TL3, TL6, orother proteins. In certain embodiments, the S-D sequences, vectors, andhost cells of the invention are used to express proteins that havepreviously been difficult to express in prokaryotic systems. The presentinvention also encompasses the combination of novel S-D sequences with avariety of expression control sequences, such as those described indetail in U.S. Pat. No. 6,194,168 (which is hereby incorporated byreference herein in its entirety), and in particular, expression controlsequences comprising at least a portion of one or more lac operatorsequences and a phage promoter comprising a −30 region.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 depicts a Shine-Dalgarno sequence of the present invention (SEQID NO: 2) and the Shine-Dalgarno sequence contained in the pHE4expression vector (SEQ ID NO:17) (See U.S. Pat. No. 6,194,168). Basesmatching the S-D sequence of the present invention (SEQ ID NO:2) arehighlighted.

FIG. 2A depicts a map of the pHE6 vector (SEQ ID NO:1), whichincorporates a S-D sequence of the invention. FIG. 2B depicts the pHE6vector (SEQ ID NO:1) with the gene encoding mature Bacillus anthracis PAincluding an ETB signal sequence (SEQ ID NO:3) inserted.

FIGS. 3A–3B compare the efficiency of TL6 protein expression using thepHE4 vector (FIG. 3B) versus the pHE6 vector (FIG. 3A), which uses a S-Dsequence of the invention. In particular, increased soluble TL6expression with the pHE6 vector can be seen in FIG. 3A as a lack of“shadow” in the gel.

FIG. 4 depicts a gel showing the quantity and quality of PA afterexpression using pHE6 and subsequent purification. Using thecompositions and methods of the invention, approximately 150 mg/L ofsoluble PA at greater than 96% purity (as measured by RP-HPLC) wasobtained.

DETAILED DESCRIPTION OF THE INVENTION

The instant invention is directed to novel Shine-Dalgarno (ribosomalbinding site) sequences. These S-D sequences result in increasedefficiency of protein expression in prokaryotic systems. The S-Dsequences of the present invention have been optimized throughmodification of several nucleotides. See, e.g., FIG. 1. In particularembodiments, the S-D sequences of the present invention comprise (oralternately consist of) SEQ ID NO:2. In additional embodiments, the S-Dsequences of the present invention comprise (or alternately consist of)nucleotides 4–13 of SEQ ID NO:2. In other embodiments, the S-D sequencesof the present invention comprise (or alternately consist of) SEQ IDNO:18.

In many embodiments, the S-D sequences of the present invention are usedin prokaryotic cells. Exemplary bacterial cells suitable for use withthe instant invention include E. coli, B. subtilis, S. aureus, S.typhimurium, and other bacteria used in the art. In other embodiments,the S-D sequences of the present invention are used in prokaryotic invitro transcription systems.

The present invention also relates to vectors and plasmids comprisingone or more S-D sequences of the invention. Such vectors and plasmidsgenerally also further comprise one or more restriction enzyme sitesdownstream of the S-D sequence for cloning and expression of a gene orpolynucleotide of interest.

In certain embodiments, vectors and plasmids of the present inventionfurther comprise additional expression control sequences, including butnot limited to those described in U.S. Pat. No. 6,194,168, and inparticular, M (SEQ ID NO:5), M+D (SEQ ID NO:6), U+D (SEQ ID NO:7), M+D1(SEQ ID NO:8), and M+D2 (SEQ ID NO:9). More generally, the expressioncontrol sequence elements contemplated include bacterial or phagepromoter sequences and functional variants thereof, whether natural orartificial; operator/repressor systems; and the lacIq gene (whichconfers tight regulation of the lac operator by blocking transcriptionof down-stream (i.e., 3′) sequences).

The lac operator sequences contemplated for use in vectors and plasmidsof the instant invention comprise (or alternately consist of) the entirelac operator sequence represented by the sequence 5′AATTGTGAGCGGATAACAATTTCACACA 3′ (SEQ ID NO:10), or a portion thereofthat retains at least partial activity, as described in U.S. Pat. No.6,194,168. Activity is routinely determined using techniques well knownin the art to measure the relative repressability of a promoter sequencein the absence of an inducer, such as IPTG. This is done by comparingthe relative amounts of protein expressed from expression controlsequences comprising portions of the lac operator sequence andfull-length lac operator sequence. The partial operator sequence ismeasured relative to the full-length lac operator sequence (e.g., SEQ IDNO:10). In one embodiment, partial activity for the purposes of thepresent invention means activity reduced by no more than 100 foldrelative to the full-length sequence. In alternative embodiments,partial activity for the purpose of the present invention means activityreduced by no more than 75, 50, 25, 20, 15, and 10 fold, relative to thefull-length lac operator sequence. In a preferred embodiment, theactivity of a partial operator sequence is reduced by no more than 10fold relative to the activity of the full-length sequence.

In many embodiments, one or more S-D sequences of the invention are usedin a vector comprising a T5 phage promoter sequence and two lac operatorsequences wherein at least a portion of the full-length lac operatorsequence (SEQ ID NO:10) is located within the spacer region between −12and −30 of the expression control sequences described in U.S. Pat. No.6,194,168. In particular embodiments, the operator sequence comprises(or alternately consists of) at least the sequence5′-GTGAGCGGATAACAAT-3′ (SEQ ID NO:11).

The previously mentioned lac-operator sequences are negatively regulatedby the lac-repressor. The corresponding repressor gene can be introducedinto the host cell in a vector or through integration into thechromosome of a bacterium by known methods, such as by integration ofthe lacIq gene. See, e.g., Miller et al, supra; Calos, (1978) Nature274:762–765. The vector encoding the repressor molecule may be the samevector that contains the expression control sequences and a gene orpolynucleotide of interest or may be a separate vector.

The S-D sequences of the invention can routinely be inserted usingprocedures known in the art into any suitable expression vector that canreplicate in gram-negative and/or gram-positive bacteria. See, e.g.,Sambrook et al., Molecular Cloning: A Laboratory Manual (Cold SpringHarbor, N.Y. 2nd ed. 1989); Ausubel et al., Current Protocols inMolecular Biology (Green Pub. Assoc. and Wiley Intersciences, N.Y.).Suitable vectors and plasmids can be constructed from segments ofchromosomal, non-chromosomal and synthetic DNA sequences, such asvarious known plasmid and phage DNAs. See, e.g., Sambrook et al.,Molecular Cloning: A Laboratory Manual (Cold Spring Harbor, N.Y. 2nd ed.1989). Especially suitable vectors include plasmids of the pDS family.See Bujard et al, (1987) Methods in Enzymology, 155:416–4333. Additionalexamples of preferred suitable plasmids include pBR322 and pBluescript(Stratagene, La Jolla, Calif.) based plasmids. Still additional examplesof preferred suitable plasmids include pUC-based vectors, includingpUC18 and pUC19 (New England Biolabs, Beverly, Mass.) and pREP4 (QiagenInc., Chatsworth, Calif.). Portions of vectors and plasmids encodingdesired functions may also be combined to form new vectors with desiredcharacteristics. For example, the origin of replication of pUC19 may berecombined with the kanamycin resistance gene of pREP4 to create a newvector with both desired characteristics.

Preferably, vectors and plasmids comprising one or more S-D sequences ofthe invention also contain sequences that allow replication of theplasmid to high copy number in the host bacterium of choice.Additionally, vector or plasmid embodiments of the invention thatcomprise expression control sequences may further comprise a multiplecloning site immediately downstream of the expression control sequencesand the S-D sequence.

Vectors and plasmids comprising one or more S-D sequences of theinvention may further comprise genes conferring antibiotic resistance.Preferred genes are those conferring resistance to ampicillin,chloramphenicol, and tetracycline. Especially preferred genes are thoseconferring resistance to kanamycin.

The optimized S-D ribosomal binding site of the invention can also beinserted into the chromosome of gram-negative and gram-positivebacterial cells using techniques known in the art. In this case,selection agents such as antibiotics, which are generally required whenworking with vectors, can be dispensed with.

Proteins of interest that can be expressed using the S-D sequences,vectors, and host cells of the invention include prokaryotic,eukaryotic, viral, or artificial proteins. Such proteins include, butare not limited to: enzymes; hormones; proteins having immunoregulatory,antiviral or antitumor activity; antibodies and fragments thereof (e.g.,Fab, F(ab), F(ab)₂, single-chain Fv, disulfide-linked Fv); or antigens.In preferred embodiments, the protein to be expressed is B. anthracisprotective antigen (PA), mutated protective antigens (mPAs) (See, e.g.,Sellman et al, JBC 276(11):8371–8376 (2001)), TL3, or TL6. Any effectivesignal sequence may be used in combination with the gene orpolynucleotide of interest. In a preferred embodiment, the ETB signalsequence is used to enhance the expression of soluble protein.

The S-D sequences of the present invention provide for increasedefficiency of protein expression in prokaryotic systems. Efficientexpression means that the level of protein expression to be expectedwhen using the S-D sequences of the instant invention is generallyhigher than levels previously reported in the art. In preferredembodiments, the resultant expressed protein can be highly purified tolevels greater than 90% purity by RF-HPLC. Particularly preferred puritylevels include 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, and near 100%purity, all of which are encompassed by the instant invention. It isexpressly contemplated by the invention that the addition of one or moreS-D sequences of the invention into any prokaryotic-based expressionsystem, including and in addition to E. coli expression systems, willresult in increased and more efficient protein expression.

The present invention also relates to methods of using the S-Dsequences, vectors, plasmids, and host cells of the invention to produceproteins and fragments thereof. In one embodiment of the invention, adesired protein is produced by a method comprising:

-   (a)transforming a bacterium with a vector in which a polynucleotide    encoding a desired protein is operably linked to a S-D sequence of    the invention;-   (b)culturing the transformed bacterium under suitable growth    conditions; and-   (c) isolating the desired protein from the culture.

In another embodiment of the invention, a desired protein is produced bya method comprising:

-   (a)inserting a S-D sequence of the invention and an expression    control sequence into the chromosome of a suitable bacterium,    wherein the S-D sequence and expression control sequence are each    operably linked to a polynucleotide encoding a desired protein;-   (b)cultivating the bacterium under suitable growth conditions; and-   (c) isolating the desired protein from the culture.

The selection of a suitable host organism is determined by variousfactors that are well known in the art. Factors to be consideredinclude, for example, compatibility with the selected vector, toxicityof the expression product, expression characteristics, necessarybiological safety precautions and costs.

Suitable host organisms include, but are not limited to, gram-negativeand gram-positive bacteria, such as E. coli, B. subtilis, S. aureus, andS. typhimurium strains. Preferred E. coli strains include DH5α(Gibco-BRL, Gaithersburg, Md.), XL-1 Blue (Stratagene), and W3110 (ATCCNo. 27325). Other E. coli strains that can be used according to thepresent invention include other generally available strains such as E.coli 294 (ATCC No. 31446), E. coli RR1 (ATCC No. 31343) and M15.

EXAMPLES

The examples which follow are set forth to aid in understanding theinvention but are not intended to, and should not be construed to, limitthe scope of the invention in any way. The examples do not includedetailed descriptions for conventional methods employed in the art, suchas for the construction of vectors, the insertion of genes encodingpolypeptides of interest into such vectors, or the introduction of theresulting plasmids into bacterial hosts. Such methods are described innumerous publications and can be carried out using recombinant DNAtechnology methods which are well known in the art. See, e.g., Sambrooket al., Molecular Cloning: A Laboratory Manual (Cold Spring Harbor, N.Y.2nd ed. 1989); Ausubel et al., Current Protocols in Molecular Biology(Green Pub. Assoc. and Wiley Intersciences, N.Y.).

Example 1 pHE6 Design

The S-D sequence used in pHE6 (SEQ ID NO:2) was based on the S-Dsequence of the pHE4 expression vector (SEQ ID NO:17) (See U.S. Pat. No.6,194,168), with three base pair changes made as indicated in FIG. 1.Additionally, the pHE6 plasmid encodes the aminoglycosidephosphotransferase protein (conferring kanamycin resistance), the lacIqrepressor, and includes a Co1E1 replicon. Construction of the pHE4plasmid upon which the pHE6 plasmid is based is described in U.S. Pat.No. 6,194,168.

Example 2 Method of Making and Purifying PA in Escherichia coli K-12

Using the following method, a post-purification final yield of solublePA greater than 2 g from 1 kg of E. coli cell paste (approximately 150mg/L) can be obtained from either shake flasks or bioreactors. See FIG.4. The purity of such soluble PA, as judged by RP-HPLC analysis, isgreater than 96–98%.

The bacterial host strain used for the production of recombinantwild-type PA from a recombinant plasmid DNA molecule is an E. coli K-12derived strain. To express protein from the expression vectors, E. colicells were transformed with the expression vectors and grown overnight(O/N) at 30° C. in 4L shaker flasks containing 1L Luria broth mediumsupplemented with kanamycin. The cultures were started at opticaldensity 600 λ (O.D.⁶⁰⁰) of 0.1. IPTG was added to a final concentrationof 1 mM when the culture reached an O.D.⁶⁰⁰ of between 0.4 and 0.6. IPTGinduced cultures were grown for an additional 3 hours. Cells were thenharvested using methods known in the art, and the level of protein wasdetected using Western blot analysis. Soluble PA was then extracted fromthe periplasm and clarified by conventional means. The clarifiedsupernatant was then purified using a Q Sepharose HP column (Amersham),concentrated, and further purified using a Biogel Hydroxyapatite HPcolumn (BioRAD). Using the expression control sequence M+D1 (SEQ IDNO:8), high levels of repression in the absence of IPTG, and high levelsof induced expression in the presence of IPTG were obtained.

Deposit of Microorganisms

Plasmid pHE6 was deposited with the American Type Culture Collection,

University Boulevard, Manassas, Va. 20110–2209 on Jun. 20, 2002 and wasgiven Accession No. PTA-4474. This culture has been accepted for depositunder the provisions of the Budapest Treaty on the InternationalRecognition of Microorganisms for the Purposes of Patent Proceedings.

The disclosures of all publications (including patents, patentapplications, journal articles, laboratory manuals, books, or otherdocuments) cited herein are hereby incorporated by reference in theirentireties.

The present invention is not to be limited in scope by the specificembodiments described herein, which are intended as illustrations ofindividual aspects of the invention. Functionally equivalent methods andcomponents are within the scope of the invention, in addition to thoseshown and described herein and will become apparent to those skilled inthe art from the foregoing description and accompanying drawings. Suchmodifications are intended to fall within the scope of the appendedclaims.

1. An isolated polynucleotide comprising a Shine-Dalgarno sequenceselected from the group consisting of: (a) SEQ ID NO:2; (b)polynucleotides 4–13 of SEQ ID NO:2; and (c) SEQ ID NO:18; wherein saidShine-Dalgarno sequence is between 4–14 nucleotides upstream from astart codon.
 2. The isolated polynucleotide of claim 1 wherein theShine-Dalgarno sequence is (a).
 3. The isolated polynucleotide of claim1 wherein the Shine-Dalgarno sequence is (b).
 4. The isolatedpolynucleotide of claim 1 wherein the Shine-Dalgarno Dalgarno sequenceis (c).
 5. A vector comprising a Shine-Dalgarno sequence selected from agroup consisting of: (a) SEQ ID NO:2; (b) polynucleotides 4–13 of SEQ IDNO:2; and (c) SEQ ID NO:18; wherein said Shine-Dalgarno sequence isbetween 4–14 nucleotides upstream from a start codon.
 6. The vector ofclaim 5 wherein the Shine-Dalgarno sequence is (a).
 7. The vector ofclaim 5 wherein the Shine-Dalgarno sequence is (b).
 8. The vector ofclaim 5 wherein the Shine-Dalgarno sequence is (c).
 9. The vector ofclaim 5 wherein said Shine-Dalgarno sequence is operably associated witha polynucleotide encoding a protein or fragment thereof.
 10. The vectorof claim 9, wherein said polynucleotide encodes SEQ ID NO:4.
 11. Thevector of claim 9, wherein said polynucleotide is operably associatedwith an expression control sequence.
 12. A method of producing a vectorcomprising inserting the Shine-Dalgarno sequence of claim 1 into avector.
 13. A method of producing a host cell comprising transducing,transforming or transfecting a host cell with the vector of claim
 5. 14.A recombinant host cell comprising the Shine-Dalgarno sequence ofclaim
 1. 15. A recombinant host cell comprising the vector of claim 5.16. A recombinant host cell comprising the vector of claim
 9. 17. Amethod of producing protein, comprising: (a) culturing the host cell ofclaim 16 under conditions suitable to produce the protein or fragmentthereof; and (b) recovering the protein or fragment thereof from thecell culture.
 18. The method of claim 17, wherein said polynucleotideencodes SEQ ID NO:4.
 19. The isolated polynucleotide sequence of claim1, wherein said Shine-Dalgarno Dalgarno sequence is between 8–10nucleotides upstream from a start codon.
 20. The vector of claim 5,wherein said Shine-Dalgarno sequence is between 8–10 nucleotidesupstream from a start codon.